Dynamic Subspace Clustering for Very Large High-Dimensional Databases
نویسندگان
چکیده
Emerging high-dimensional data mining applications needs to find interesting clusters embeded in arbitrarily aligned subspaces of lower dimensionality. It is difficult to cluster high-dimensional data objects, when they are sparse and skewed. Updations are quite common in dynamic databases and they are usually processed in batch mode. In very large dynamic databases, it is necessary to perform incremental cluster analysis only to the updations. We present a incremental clustering algorithm for subspace clustering in very high dimensions, which handles both insertion and deletions of datapoints to the backend databases.
منابع مشابه
ISC–Intelligent Subspace Clustering, A Density Based Clustering Approach for High Dimensional Dataset
Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimens...
متن کاملSubspace outlier mining in large multimedia databases
Increasingly large multimedia databases in life sciences, ecommerce, or monitoring applications cannot be browsed manually, but require automatic knowledge discovery in databases (KDD) techniques to detect novel and interesting patterns. Clustering, aims at grouping similar objects into clusters, separating dissimilar objects. Density-based clustering has been shown to detect arbitrarily shaped...
متن کاملMafia: Eecient and Scalable Subspace Clustering for Very Large Data Sets Center for Parallel and Distributed Computing Mafia: Eecient and Scalable Subspace Clustering for Very Large Data Sets
Clustering techniques are used in database mining for nding interesting patterns in high dimensional data. These are useful in various applications of knowledge discovery in databases. Some challenges in clustering for large data sets in terms of scalability, data distribution, understanding end-results, and sensitivity to input order, have received attention in the recent past. Recent approach...
متن کاملMulti-view Subspace Clustering for High-dimensional Data
The data today is towards more observations and very high dimensions. Large high-dimensional data are usually sparse and contain many classes/clusters. For example, large text data in the vector space model often contains many classes of documents represented in thousands of terms. It has become a rule rather than the exception that clusters in high-dimensional data occur in subspaces of data, ...
متن کاملMining Subspace Clusters: Enhanced Models, Efficient Algorithms and an Objective Evaluation Study
In the knowledge discovery process, clustering is an established technique for grouping objects based on mutual similarity. However, in today’s applications for each object very many attributes are provided in large and high dimensional databases. As multiple concepts described by different attributes are mixed in the same data set, clusters are hidden in subspace projections and do not appear ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003